earning roeedura

نویسنده

  • Douglas J. Pearson
چکیده

Autonomous agents functioning in complex and rapidly changing environments can improve their task performance if they update and correct their world model over the life of the agent. Existing research on this problem can be divided into two classes. First, reinforcement learners that use weak inductive methods to directly modify an agent’s procedural execution knowledge. These systems are robust in dynamic and complex environments but generally do not support planning or the pursuit of multiple goals and learn slowly as a result of their weak methods. In contrast, the second category, theory revision systems, learn declarative planning knowledge through stronger methods that use explicit reasoning to identify and correct errors in the agent’s domain knowledge. However, these methods are generally only applicable to agents with instantaneous actions in fully sensed domains. This research explores learning procedural planning knowledge through deliberate reasoning about the correctness of an agent’s knowledge. As the system, IMPROV, uses a procedural knowledge representation it can efficiently be extended to complex actions that have duration and multiple conditional effects, taking it beyond the scope of traditional theory revision systems. Additionally, the deliberate reasoning about correctness leads to stronger, more directed learning, than is possible in reinforcement learners. An IMPROV agent’s planning knowledge is represented by production rules that encode preconditions and actions of operators. Plans are also procedurally represented as rule sets that efficiently guide the agent in making local decisions during execution. Learning occurs during plan execution whenever the agent’s knowledge is insufficient to determine the next action to take. This is a weaker method than traditional plan monitoring, where incorrect predictions trigger the correction method, as prediction-based methods perform poorly in stochastic environments. IMPROV’s method for correcting domain knowledge is primarily based around correcting operator preconditions. This is done by generating and executing alternative plans in decreasing order of expected likelihood of reaching the current goal. Once a successful plan has been discovered, IMPROV uses an inductive learning module to correct the preconditions of the operators used in the set of k plans (successes and failures). Each operator and whether it lead to success or failure is used as a training instance. This k-incremental learning is based on the last k instances and results in incremental performance which is required in domains that are time-critical. K-incremental learning is stronger than traditional reinforcement learning as the differences between successful plans and failed plans lead to better credit assignment in determining which operator(s) were incorrect in the failed plans and how the operator’s planning knowledge was wrong. Actions are corrected by recursively re-using the precondition correction method. The agent’s domain knowledge is encoded as a hierarchy of operators of progressively smaller grain size. The most primitive operators manipulate only a single symbol, guaranteeing they have correct actions. Incorrect actions at higher levels are corrected by changing the preconditions of the sub-operators which implement them. For example, the effects of a brake operator are encoded as more primitive operators which modify the car’s speed, tire condition etc. IMPROV’s correction method is recursively employed to change the preconditions of these sub-operators and thereby correct the planning knowledge associated with the brake operator’s actions. This method allows IMPROV to learn complex actions with durations and conditional effects. The system has been tested on a robotic simulation and in driving a simulated car. We have demonstrated that k-incremental learning outperforms single instance incremental learning and that a procedural representation supports correcting complex noninstantaneous actions. We have also shown noisetolerance, tolerance to a large evolving target domain theory and learning in time-constrained environments.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Empirical Study on the Risk Perception by Users in Personal Banking Sector

Business process reengineering has made revolutionary changes in the personal banking sector. It is the Technological Innovations that improved the service quality, responsiveness and business efficiency. This study focuses on different user profiles (Earning/Non Earning), their perception of risk and confidence in technological innovations in personal banking sector. Required data were collect...

متن کامل

The Relationship between Earning Management and Capital Structure

This paper analyzes the relationship between capital structure and earning management. For analysing we used 119 non-financial companies that listed in Tehran Stock Exchange from 2000 to 2008. The researchers will focus on comparing the Jones Model and the Modified Jones Model, which are the two most frequently used model in empirical analysis nowadays. Earnings management is a kind of manageme...

متن کامل

The Relationship between Earning Management andCapital Structure

This paper analyzes the relationship between capital structure and earning management. For analyzing we use 119 non-financial companies that listed in Tehran Stock Exchange from 2000 to 2008. The researchers will focus on comparing the Jones Model and the Modified Jones Model, which are the two most frequently used model in empirical analysis nowadays. Earnings management is a kind of managemen...

متن کامل

Investigating the Relationship between Accounting Earning and Gross Domestic Product in Companies Listed in Tehran Stock Exchange

Accounting earning represents the positive performance of companies during theirpertinent financial periods, thus it is assumed that accounting earning will benoted by investors, which could contribute to the optimum allocation of resourcesto successful companies. It can also play a major role in the economic growth anddevelopment of a society. This research focuses in the relationship between ...

متن کامل

The Dynamic Relationship among Dividend, Earning and Investment: Empirical Analysis of Karachi Stock Exchange

This paper divulges the long term relationship among earning, investment and dividends from 2000 to 2011. Empirical evidence was collected to explore the Modigliani and miller theory of dividend irrelevance. Data was collected from all the sectors but it was ensured that firms did not have negative data of earnings as it is earnings which are either transformed into investment or dividends. Mul...

متن کامل

بررسی تأثیر ابزارهای نظارتی حاکمیت شرکتی در کاهش شدت مدیریت سود

This study tries to examine the effect of monitoring mechanism of Corporate Governance to decreasing the Earning Management Intensity. Intensity of earning management behavior is described on the basis of the 'threshold model' through bringing up the reported earnings to thresholds. It consists beating the benchmark (thresholds) and not doing so. Our empirical tests are conducted by regressio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999